Skip to content

[smoke][bugfix] moe_init_routing_v2 active_expert_range use int type#5521

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
shenchuxiaofugui:smoke_1230
Dec 31, 2025
Merged

[smoke][bugfix] moe_init_routing_v2 active_expert_range use int type#5521
wangxiyuan merged 1 commit intovllm-project:mainfrom
shenchuxiaofugui:smoke_1230

Conversation

@shenchuxiaofugui
Copy link
Copy Markdown
Collaborator

@shenchuxiaofugui shenchuxiaofugui commented Dec 30, 2025

What this PR does / why we need it?

The float kernel of MOE_init_routing_v2 in the dispatch allgather operation does not support tensor format for active_expert_range; it only supports int.
PR5311 To unify the variables local_num_experts and self.local_num_experts, self.local_num_experts was used consistently, which led to the subsequent integer type parameter being converted to a tensor type.

Does this PR introduce any user-facing change?

How was this patch tested?

gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 | success=✅
gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856 | success=✅
ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅
Model Parameters:
{'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype': 'auto', 'trust_remote_code': False, 'max_model_len': 4096, 'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True}

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where num_local_experts could be a torch.Tensor, causing a type error in the npu_moe_init_routing_v2 kernel which expects an integer for its active_expert_range parameter. The fix correctly handles this by checking if num_local_experts is a tensor and extracting its value with .item(), or casting it to an integer otherwise. This change is correct and effectively resolves the issue.

@shenchuxiaofugui shenchuxiaofugui changed the title [smoke][bugfix] moe_init_routing_v2 use int type [smoke][bugfix] moe_init_routing_v2 active_expert_range use int type Dec 30, 2025
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
@vllm-ascend-ci vllm-ascend-ci added accuracy-test enable all accuracy test for PR and removed accuracy-test enable all accuracy test for PR labels Dec 30, 2025
@wangxiyuan wangxiyuan merged commit bdc721d into vllm-project:main Dec 31, 2025
42 of 48 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Dec 31, 2025
…to FIA_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [feature] mooncake support pcp/dcp in common conditions (vllm-project#5224)
  [Bugfix] Fix mm_merge (vllm-project#5249)
  [Main2Main] Upgrade vllm commit to 1230 (vllm-project#5495)
  [Feature] Refactor PCP &DCP related code (vllm-project#5214)
  [main][test] Refactor the mtp and eagle test case (vllm-project#5326)
  [smoke][bugfix] moe_init_routing_v2 active_expert_range use int type (vllm-project#5521)
  [2/N] Upgrade nightly doc (vllm-project#5534)
  [Doc] Add new contributors. (vllm-project#5537)
  [3/N][Nightly] Move ops tests to nightly (vllm-project#5538)
wangyibo1005 pushed a commit to wangyibo1005/vllm-ascend that referenced this pull request Dec 31, 2025
…llm-project#5521)

### What this PR does / why we need it?
The float kernel of MOE_init_routing_v2 in the dispatch allgather
operation does not support tensor format for active_expert_range; it
only supports int.
PR5311 To unify the variables `local_num_experts` and
`self.local_num_experts`, `self.local_num_experts` was used
consistently, which led to the subsequent integer type parameter being
converted to a tensor type.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 |
success=✅
gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856
| success=✅
ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅
Model Parameters:
{'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype':
'auto', 'trust_remote_code': False, 'max_model_len': 4096,
'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True}

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
Rozwel-dx pushed a commit to Rozwel-dx/vllm-ascend that referenced this pull request Jan 8, 2026
…llm-project#5521)

### What this PR does / why we need it?
The float kernel of MOE_init_routing_v2 in the dispatch allgather
operation does not support tensor format for active_expert_range; it
only supports int.
PR5311 To unify the variables `local_num_experts` and
`self.local_num_experts`, `self.local_num_experts` was used
consistently, which led to the subsequent integer type parameter being
converted to a tensor type.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 |
success=✅
gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856
| success=✅
ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅
Model Parameters:
{'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype':
'auto', 'trust_remote_code': False, 'max_model_len': 4096,
'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True}

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
@shenchuxiaofugui shenchuxiaofugui deleted the smoke_1230 branch January 12, 2026 10:51
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…llm-project#5521)

### What this PR does / why we need it?
The float kernel of MOE_init_routing_v2 in the dispatch allgather
operation does not support tensor format for active_expert_range; it
only supports int.
PR5311 To unify the variables `local_num_experts` and
`self.local_num_experts`, `self.local_num_experts` was used
consistently, which led to the subsequent integer type parameter being
converted to a tensor type.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 |
success=✅
gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856
| success=✅
ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅
Model Parameters:
{'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype':
'auto', 'trust_remote_code': False, 'max_model_len': 4096,
'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True}

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…llm-project#5521)

### What this PR does / why we need it?
The float kernel of MOE_init_routing_v2 in the dispatch allgather
operation does not support tensor format for active_expert_range; it
only supports int.
PR5311 To unify the variables `local_num_experts` and
`self.local_num_experts`, `self.local_num_experts` was used
consistently, which led to the subsequent integer type parameter being
converted to a tensor type.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 |
success=✅
gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856
| success=✅
ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅
Model Parameters:
{'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype':
'auto', 'trust_remote_code': False, 'max_model_len': 4096,
'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True}

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…llm-project#5521)

### What this PR does / why we need it?
The float kernel of MOE_init_routing_v2 in the dispatch allgather
operation does not support tensor format for active_expert_range; it
only supports int.
PR5311 To unify the variables `local_num_experts` and
`self.local_num_experts`, `self.local_num_experts` was used
consistently, which led to the subsequent integer type parameter being
converted to a tensor type.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?
gsm8k | exact_match,strict-match: ground_truth=0.89 | measured=0.8939 |
success=✅
gsm8k | exact_match,flexible-extract: ground_truth=0.85 | measured=0.856
| success=✅
ceval-valid | acc,none: ground_truth=0.84 | measured=0.8373 | success=✅
Model Parameters:
{'pretrained': 'Qwen/Qwen3-30B-A3B', 'tensor_parallel_size': 2, 'dtype':
'auto', 'trust_remote_code': False, 'max_model_len': 4096,
'gpu_memory_utilization': 0.6, 'enable_expert_parallel': True}

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@45c1ca1

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

accuracy-test enable all accuracy test for PR module:ops ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants